Consolidated Trees: An Analysis of Structural Convergence

نویسندگان

  • Jesús M. Pérez
  • Javier Muguerza
  • Olatz Arbelaitz
  • Ibai Gurrutxaga
  • José Ignacio Martín
چکیده

When different subsamples of the same data set are used to induce classification trees, the structure of the built classifiers is very different. The stability of the structure of the tree is of capital importance in many domains, such as illness diagnosis, fraud detection in different fields, customer’s behaviour analysis (marketing), etc, where comprehensibility of the classifier is necessary. We have developed a methodology for building classification trees from multiple samples where the final classifier is a single decision tree (Consolidated Trees). The paper presents an analysis of the structural stability of our algorithm versus C4.5 algorithm. The classification trees generated with our algorithm, achieve smaller error rates and structurally more steady trees than C4.5 when using resampling techniques. The main focus on this paper is showing how Consolidated Trees built with different sets of subsamples tend to converge to the same tree when the number of used subsamples is increased.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Consolidated Tree Construction Algorithm: Structurally Steady Trees

This paper presents a new methodology for building decision trees or classification trees (Consolidated Trees Construction algorithm) that faces up the problem of unsteadiness appearing in the paradigm when small variations in the training set happen. As a consequence, the understanding of the made classification is not lost, making this technique different from techniques such as bagging and b...

متن کامل

The eccentric connectivity index of bucket recursive trees

If $G$ is a connected graph with vertex set $V$, then the eccentric connectivity index of $G$, $xi^c(G)$, is defined as $sum_{vin V(G)}deg(v)ecc(v)$ where $deg(v)$ is the degree of a vertex $v$ and $ecc(v)$ is its eccentricity. In this paper we show some convergence in probability and an asymptotic normality based on this index in random bucket recursive trees.

متن کامل

Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance

This paper presents an analysis of the behaviour of Consolidated Trees, CT (classification trees induced from multiple subsamples but without loss of explaining capacity). We analyse how CT trees behave when used to solve a fraud detection problem in a car insurance company. This domain has two important characteristics: the explanation given to the classification made is critical to help inves...

متن کامل

Dynamics and Structural characteristics of a natural unlogged oriental beech (Fagus orientalis Lipsky) stand during a 5-years period in Shast Kalate Forest, Northern Iran

Investigation on structure and dynamics of natural forest ecosystems is an important issue for silvicultural decisions. The aim of this study is to analysis dynamics and structure of a beech stand during 5-year period in the Shast Kalateh forest in the Caspian region, North of Iran. Data were collected from a 16.9ha permanent research plot established in a natural unlogged stand in 2006. All li...

متن کامل

The Effect of the Used Resampling Technique and Number of Samples in Consolidated Trees’ Construction Algorithm

In many pattern recognition problems, the explanation of the made classification becomes as important as the good performance of the classifier related to its discriminating capacity. For this kind of problems we can use Consolidated Trees ́ Construction (CTC) algorithm which uses several subsamples to build a single tree. This paper presents a wide analysis of the behavior of CTC algorithm for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006